Markov Games: Receding Horizon Approach
نویسندگان
چکیده
We consider a receding horizon approach as an approximate solution to two-person zero-sum Markov games with infinite horizon discounted cost and average cost criteria. We first present error bounds from the optimal equilibrium value of the game when both players take correlated equilibrium receding horizon policies that are based on exact or approximate solutions of receding finite horizon subgames. Motivated by the worst-case optimal control of queueing systems by Altman [1], we then analyze error bounds when the minimizer plays the (approximate) receding horizon control and the maximizer plays the worst case policy. We give three heuristic examples of the approximate receding horizon control. We extend “rollout” by Bertsekas and Castanon [9] and “parallel rollout” and “hindsight optimization” by Chang et al. [13, 16] into the Markov game setting within the framework of the approximate receding horizon approach and analyze their performances. From the rollout/parallel rollout approaches, the minimizing player seeks to improve the performance of a single heuristic policy it rolls out or to combine dynamically multiple heuristic policies in a set to improve the performances of all of the heuristic policies simultaneously under the guess that the maximizing player has chosen a fixed worst-case policy. Given > 0, we give the value of the receding horizon which guarantees that the parallel rollout policy with the horizon played by the minimizer dominates any heuristic policy in the set by . From the hindsight optimization approach, the minimizing player makes a decision based on his expected optimal hindsight performance over a finite horizon. We finally discuss practical implementations of the receding horizon approaches via simulation.
منابع مشابه
Receding Horizon Based Control of Disturbed Upright Balance with Consideration of Foot Tilting(RESEARCH NOTE)
In some situations, when an external disturbance occurs, humans can rock stably backward and forward by lifting the toe or the heel to keep the upright balance without stepping. Many control schemes have been proposed for standing balance control under external disturbances without stepping. But, in most of them researchers have only considered a flat foot phase. In this paper a framework is pr...
متن کاملApproximate Receding Horizon Approach for Markov Decision Processes: Average Reward Case
We consider an approximation scheme for solving Markov Decision Processes (MDPs) with countable state space, finite action space, and bounded rewards that uses an approximate solution of a fixed finite-horizon sub-MDP of a given infinite-horizon MDP to create a stationary policy, which we call “approximate receding horizon control”. We first analyze the performance of the approximate receding h...
متن کاملReceding Horizon Control for Constrained Jump Bilinear Systems
In this paper, a receding horizon control strategy for a class of bilinear discrete-time systems with Markovian jumping parameters and constraints is investigated. Specifically, the stochastic jump system under consideration involves control and state multiplicative noise and partly unknown transition probabilities (TPs). The receding horizon formulation adopts an on-line optimization paradigm ...
متن کاملDesign of Distributed Optimal Adaptive Receding Horizon Control for Supply Chain of Realistic Size under Demand Disturbances
supply chain network receding horizon control demand move suppression term Supply chain networks are interconnection and dynamics of a demand network. Example subsystems, referred to as stages, include raw materials, distributors of the raw materials, manufacturers, distributors of the manufactured products, retailers, and customers. The main objectives of the control strategy for the s...
متن کاملARES: Adaptive Receding-Horizon Synthesis of Optimal Plans
We introduce ARES, an efficient approximation algorithm for generating optimal plans (action sequences) that take an initial state of a Markov Decision Process (MDP) to a state whose cost is below a specified (convergence) threshold. ARES uses Particle Swarm Optimization, with adaptive sizing for both the receding horizon and the particle swarm. Inspired by Importance Splitting, the length of t...
متن کامل